
# Replication File for "Does a Spoonful of Sugar Levy Help the Calories Go Down? An Analysis of the UK Soft Drinks Industry Levy" 

By Alex Dickson, Markus Gehrsitz, and Jonathan Kemp

Date: 23 March 2023

Author: Markus Gehrsitz



Operating systems:  Microsoft Windows 10 Enterprise, All Analyses were done in Stata MP 17.0

Relative paths are provided in 000_Masterfile


## Data Instructions:

Note that the brand-level data cannot be made publicly available as they are commercially sensitive.

The original data can be obtained from IRI (Information Resources, Inc.). Please contact their sales team if you are interested in purchasing the data.
The brands used for the study are listed in the reformdates spreadsheet (among others)



The following cleaningfiles take the raw data, clean it, and slice & dice it into sub-datafiles that can then be used for analysis and the replication
of all tables and figures in the paper.

Below is a short description:

01_loading_BrandData: 
- loads the brand-level data which is outputted in Excel into Stata
- After destringing variables and creating stata dates, it saves 3 separate datafiles:
	1) branddata_all_20142020 which contains at the brand-level Value, Volume, and Price per Litre over time
	2) branddata_drinknow_20142020 which contains at the brand-level Value, Volume, and Price per Litre over time but for just small containers ("drink now")
	3) branddata_takehome_20142020 which contains at the brand-level Value, Volume, and Price per Litre over time but for just large containers ("take home")


02_loading_Categorydata - redundant, not relevant for published version. (not provided)

03_weatherdata: just loads in weather data and again destrings and creates date variables. 


04_CleaningBrandData: Back to our brand-level datasets
- clean this up properly (leading blanks etc.)
- Generate dummies for our categories, in particular the following four:
	1) Levied
	1a) Levied Colas
	1b) Levied Energy
	1c) Diet-Versions of Levied Colas
	1d) Diet-Version of Levied Energy

	2) Reformulator

These are all just extra columns but rather than overwriting the files created in 01, we save new version


05_MatchingProcessingBrandData: This is really just a continuation of the 04-file 
We create a few extra dummy categories for:
 3) diet (by way of a merge)
 4) levy-excempt -> which has the slightly counterintuive name "exclDiet". It is the subset of the non-levied products that is not a diet-products
We also merge the weather data in.


06_ReformulatorListImport: Really just imports the list of reformulating brands and saves in Stata format as "reformdates"

07_CreateSubDataasets_BrandLEvel_Loop: This now aggregates the brand-level data up to larger categories (so we can create the charts). We simply loop over the
categories that were created in 04 and 05. The end result are a series of aggregted datasets that we call "National" because they reflect the sales in a category
at the National level.
For example: NationalData_takehome_levied_cola gives for each of the 288 weeks in our data the total sales volumes for large containers of colas that fell under
the levy. See data dictionary for other subcategories.


08_PrepareDiffinDiffData: Here we go back to the brand-level datasets created in 05. We merge in the reformulation dates from 06
- Then we rate a series of 4-week leads and lags.
- We also rescale a bit and take logs
The resulting dataset is called dataDD as it is the dataset that can then be used for our difference-in-difference analysis.
Note that these data are not seasonally adjusted because we will include week fixed-effects in the DD-analysis


09a_PreparePriceBreakData: HEre we now take the "National" subdatasets created in 07 and seasonally adjust them
Rather than overwriting the National-files, we save them with suffix _seasonallyadj
For example,  NationalData_levied_cola  becomes NationalData_levied_cola_seasonallyadj
Same for other files.
Note: Here the datasets for Figure 3 are created!


09b_PreparePriceBreakData does the same exercise as 09a for takehome and drinknow datasets


10a_PreparePriceBReakData_BrandLevel: We now also seasonally adjust the brand-level data


10b_PreparePriceBReakData_BrandLevel: We now also seasonally adjust the brand-level data but separately for takehome and drinknow


11 - redundant, not relevant for published version. (not provided)


12_Sugarcontent: This just takes the brandlevel data (which contains calorie-content) and recodes it for each of the 100 brands into sugar content
In other words, we have a list for our top 100 brands on their sugar content both before the announcement of the levy and after it went into effect.

This dataset is then used to create Figures 1 and 2!


13_RedoPreWeight: During the revision process, a referee asked for the analysis for Figures 5 to be run but volume-weighted by pre-levy sales.
In other words, price per litre for each subcategory is calculated holding the pre-levy mix of a category constant. In practice, this did not
make a difference, but these are the final results reported.
So what this dofile does is take the brand-level data and within each subgroup (e.g. levied colas in large containers) calculates the market share
of each brand contributing to this subgroup. That is the market share prior to the levy going into effect.
We then recalculate the price-time series as weighted means and add them back to the volume and calorie time series.
We repeat the seasonal adjustment, but here separately for the pre-levy and post-levy periods (structural break)

At the end we have a separate dataset for levied colas, levied energies, diet versions of levied colas, diet version of levied energy, all diet products, and all levy-exempt products,
and those again split by takehome and drinknow. So 12 datasets overall
Note: These 12 datasets are then used to create Figures 5a-d!





The analysis files are all pretty self-explanatory. 

Figure2.do creates Figure2, Figure3.do creates Figure3, etc.


All tables and figures are replicable from the datasets provided with the exception of the event-studies.
The event-studies are created from commercially sensitive brand-level data which obviously cannot be put into the public domain.
We have, however, outlined above how our code conveniently transforms brand-level data obtained from IRi into the dataDD file that our event-study
analysis is based on.

Note that for the analysis-files, the paths have been rewritten so that they run with the data provided in the replication package. In other words,
all you have to do is place the data in a replicationdata-folder and the code will generate the respective outputs.
The original paths have merely been commented out, so if you replicate with the original brand-level data, all you have to do is at the top of each
file is to remove the replicationdata-folder path and comment the original paths back in.



